10 research outputs found

    Distributed mining of convoys in large scale datasets

    Get PDF
    Tremendous increase in the use of the mobile devices equipped with the GPS and other location sensors has resulted in the generation of a huge amount of movement data. In recent years, mining this data to understand the collective mobility behavior of humans, animals and other objects has become popular. Numerous mobility patterns, or their mining algorithms have been proposed, each representing a specific movement behavior. Convoy pattern is one such pattern which can be used to find groups of people moving together in public transport or to prevent traffic jams. A convoy is a set of at least m objects moving together for at least k consecutive time stamps where m and k are user-defined parameters. Existing algorithms for detecting convoy patterns do not scale to real-life dataset sizes. Therefore in this paper, we propose a generic distributed convoy pattern mining algorithm called DCM and show how such an algorithm can be implemented using the MapReduce framework. We present a cost model for DCM and a detailed theoretical analysis backed by experimental results. We show the effect of partition size on the performance of DCM. The results from our experiments on different data-sets and hardware setups, show that our distributed algorithm is scalable in terms of data size and number of nodes, and more efficient than any existing sequential as well as distributed convoy pattern mining algorithm, showing speed-ups of up to 16 times over SPARE, the state of the art distributed co-movement pattern mining framework. DCM is thus able to process large datasets which SPARE is unable to.SCOPUS: ar.jDecretOANoAutActifinfo:eu-repo/semantics/publishe

    k/2-hop: Fast Mining of Convoy Patterns With Effective Pruning

    Get PDF
    With the increase of devices equipped with location sensors, mining spatio-temporal data for interesting behavioral patterns has gained attention in recent years. One of such well-known patterns is the convoy pattern which can be used, e.g. to find groups of people moving together in public transport or to prevent traffic jams. A convoy consists of at least m objects moving together for at least k consecutive time instants where m and k are user-defined parameters. Convoy mining is an expensive task and existing sequential algorithms do not scale to real-life dataset sizes. Existing sequential as well as parallel algorithms require a complex set of data-dependent parameters which are hard to set and tune. Therefore, in this paper, we propose a new fast exact sequential convoy pattern mining algorithm \k/2-hop" that is free of data-dependent parameters. The proposed algorithm processes the data corresponding to a few specific key timestamps at each step and quickly prunes objects with no possibility of forming a convoy. Thus, only a very small portion of the complete dataset is considered for mining convoys. Our experimental results show that k/2-hop outperforms existing sequential as well as parallel convoy pattern mining algorithms by orders of magnitude, and scales to larger datasets which existing algorithms fail on.SCOPUS: cp.pDecretOANoAutActifinfo:eu-repo/semantics/publishe

    Movement Pattern Miningover Large-Scale Datasets

    No full text

    Movement Pattern Mining over Large-Scale Datasets

    No full text
    Movement pattern mining involves the processing of movement data to understand the mobility behaviour of humans/animals. Movement pattern mining has numerous applications, e.g. traffic optimization, event planning, optimization of public transport and carpooling. The recent digital revolution has caused a wide-spread use of smartphones and other devices equipped with GPS. These devices produce a tremendous amount of movement data which contains valuable mobility information. Many interesting mobility patterns and algorithms to mine them have been proposed in recent years to mine different types of mobility behaviours, e.g. convoy, flock, group, swarm or platoon, etc. The drastic increase in the volumes of data being generated limits the use of these algorithms in the mining of movement patterns on real-world data sizes because of their lack of scalability.This thesis deals with three aspects of movement pattern mining, i.e. scalability, efficiency, and real-timeliness with a focus on convoy pattern mining. A convoy pattern is a group of objects moving together for a certain period. Mining of convoy pattern involves clustering of the movement dataset at each timestamp and then merging the clusters to form convoys. Clustering the whole dataset is a limiting factor in the scalability of existing algorithms. One way to solve the scalability problem is to mine convoys in parallel. Parallel mining can be done either using the existing distributed spatiotemporal data processing system like Parallel Secondo or by using a general distributed data processing system. We first test the scalability behaviour of Parallel Secondo for mining movement patterns and conclude that it is not an industrial grade system and its scalability is limited. An essential part of designing distributed data processing algorithms is the data partitioning strategy. We study three different data partitioning strategies, i.e. Object-based, spatial and temporal. We analyze their suitability to convoy pattern mining based on 5 properties, i.e. data exchange, data redundancy, partitioning cost, disk seeks and data ordering. Our study shows that the temporal partitioning strategy is best suited for convoy mining as it is easily parallelizable and less complicated. The observations in our study also apply to other movement pattern mining algorithms, e.g. flock, group or platoon, etc.Based on the temporal partitioning strategy, we propose a generic distributed shared nothing convoy mining algorithm called DCM which is linearly scalable concerning the data size, data density and the number of nodes. DCM can be implemented using any distributed data processing framework. For our experiments, we implemented the algorithm using the Hadoop MapReduce framework. It performs better than the existing sequential algorithms, i.e. CuTs family of algorithms by an order of magnitude on different computing architectures, e.g. single x86 machine, multi-core cluster with NUMA architecture and multi-node SMP clusters. Although DCM is a scalable distributed algorithm which can process huge datasets, the cost of maintaining the cluster is high. Also, the heavy computation it incurs because of the requirement of clustering the whole dataset is not resource-efficient.To solve the efficiency problem of DCM, we propose a new sequential algorithm called k/2-hop which even being a sequential algorithm can perform orders of magnitude faster than the existing state-of-the-art sequential as well as distributed algorithms. The main strength of the algorithm is its pruning capability. Our experiments show that it can prune up to 99% of the data. k/2-hop uses a notion of benchmark points which are timestamps separated by k/2 timestamps where k is the minimum length of the convoys to be mined. We prove that to be able to mine maximal convoys; we need to cluster the data belonging to the benchmark points only. For the timestamps between two consecutive benchmark points, we propose an efficient mining algorithm called the Hop Window Mining Tree (HWMT). HWMT clusters the data corresponding to only those objects that are part of a cluster in the benchmark points. k/2-hop is a batch algorithm that can mine convoys very fast, but we only get the result when the complete dataset has been processed. Also, it requires the data to be indexed for better performance and thus cannot be used in real-time scenarios. We propose a streaming variant of the k/2-hop algorithm which does not require the input dataset to be indexed and can process a stream of data. It outputs the mined convoys as and when they are discovered. The streaming k/2-hop algorithm is very memory efficient and can process data that is many times bigger than the memory made available to the algorithm. We show through experiments that if we include the data loading and indexing time in the runtime of the k/2-hop algorithm, streaming k/2-hop is the fastest convoy mining algorithm to date. Convoy pattern is part of a bigger category of co-movement patterns, and most of the observations (if not all) made in this thesis about convoy pattern mining also apply to other patterns of the category such as flock, group or platoon, etc. This applicability means that a generic batch and streaming distributed co-movement pattern mining framework can be build using the k/2 technique.Doctorat en Sciences de l'ingénieur et technologieinfo:eu-repo/semantics/nonPublishe

    k/2-hop: fast mining of convoy patterns with effective pruning

    Get PDF
    With the increase of devices equipped with location sensors, mining spatio-temporal data for interesting behavioral patterns has gained attention in recent years. One of such well-known patterns is the convoy pattern which can be used, e.g. to find groups of people moving together in public transport or to prevent traffic jams. A convoy consists of at least m objects moving together for at least k consecutive time instants where m and k are user-defined parameters. Convoy mining is an expensive task and existing sequential algorithms do not scale to real-life dataset sizes. Existing sequential as well as parallel algorithms require a complex set of data-dependent parameters which are hard to set and tune. Therefore, in this paper, we propose a new fast exact sequential convoy pattern mining algorithm \k/2-hop" that is free of data-dependent parameters. The proposed algorithm processes the data corresponding to a few specific key timestamps at each step and quickly prunes objects with no possibility of forming a convoy. Thus, only a very small portion of the complete dataset is considered for mining convoys. Our experimental results show that k/2-hop outperforms existing sequential as well as parallel convoy pattern mining algorithms by orders of magnitude, and scales to larger datasets which existing algorithms fail on.SCOPUS: cp.pDecretOANoAutActifinfo:eu-repo/semantics/publishe

    Distributed convoy pattern mining

    No full text
    Due to the wide spread of mobile devices equipped with location sensors, the amount of mobility data being generated is enormous. Mining this data to reveal interesting behavioral patterns has gained attention in recent years. Various mobility patterns have been proposed which describe collective mobility behaviour. One such pattern is the convoy pattern which can be used to find groups of people moving together in public transport or for prevention of traffic jams. A convoy consists of at least m objects moving together for at least k consecutive time instants where m and k are user-defined parameters. Existing algorithms for detecting convoy patterns, however, do not scale to real-life dataset sizes. Therefore in this paper, we propose a generic distributed convoy pattern mining algorithm and show how such an algorithm can be implemented using the MapReduce framework. Our experimental results show that our distributed algorithm is scalable and more efficient than the existing sequential convoy pattern mining algorithms.SCOPUS: cp.pinfo:eu-repo/semantics/publishe

    Large-Scale Graph Processing Using Apache Giraph

    No full text
    corecore